Applications
Objectives
Methods
Interpretation
February 22, ’23
Applications
Objectives
Methods
Interpretation
aid in discovery of new populations of imperiled plants
aid in creation of reserves under climate change models
aid in predicting joint species distributions, i.e. obligate mutualisms
using known occurrences of a species, identify areas which have similar habitat and the potential to support populations
but, what about dispersal?
competition?
mutualisms?
Besseya (=Synthris) alpina (A. Gray) Rydberg.
American Basin
B. alpina, Franklin #3948
define spatial domain and grain
software environments
dependent variables
independent variables
modelling approaches
model evaluation
predicting a model into space
domain; spatial extent of study
- administrative boundary
- ecological model
grain; scales in space and time
- resolution at which process occurs (space)
- current and past climate (time)
- projected climates
- (animals) seasonal patterns?
limitation: compute power
Domain
R
grass gis many modules for creating predictors
qgis graphical user interface for mouse guided visualization
occurrences of a species in space (and time)
Linear models:
Occurence Records
just create some random red points and add em to the map above.
domain: continental (e.g. North America)
- maximum and minimum daily temperatures [monthly, 4km]
- precipitation [monthly, 4km]
- hydrologic drainage [millenial, 4km]
domain: regional (e.g. Southern Rockies)
- elevation [millenial, 1km]
- soil classes [millenial, 1km]
- solar radiation [millenial, 1km]
- precipitation form [monthly, 1km]
domain: fine (e.g. McDonald Woods)
- micro topography [decade, 1m]
- water relations [decade, 1m]
- shade [weekly, 1m]
- soils [decade, 1m]
Percent bedrock (rocky, young soils)
Elevation (alpine habitat)
Bare ground (few others plants?)
X-Y coords (alpine zone decreases with latitude)
Soil surface pH (calcareous bedrock?)
Precipitation as snow (monsoonal influence?)
explicitly check for variation
carefully encode categorical data
too much, may not be useful
too little, may not be useful
pilot knock out studies; use one variable leaving the others out
warrants simplifying a variable?
problems with all models
- garbage.(in) -> garbage.(out)
- influential outliers
with machine learning;
- models can fixate on these observations
solution:
- run many models, synthesize the results
“we are stronger together than we are alone” - Walter Payton
correlated!
all evaluation performed by computer – too much information
much more common approach than individual linear models
species distributions are generally too complex for individual predictors, and building fully interactive terms would take a long time.
the typical approach since the late 90’s
do the work for you
none, get a few observations, the more the merrier.
train/test split (partition data)
no free lunch
try many types of models, select some that work for your application
commonly implemented:
\[ Accuracy = \frac{\text{correct classifications}}{\text{all classifications }} \] \[ Sensitivity = \frac{\text{true positives}}{\text{true positives + false negatives }} \] probability of the method giving a positive result when the test subject is positive.
\[ Specificity = \frac{\text{true negatives}}{\text{true negatives + false positives }} \] probability of the method giving a negative result when the test subject is negative
Occurence Records
keep a lab notebook; this is bench science
always start models small (avoid computer crashes)
use strong and discrete directory organization
scratch paper, whiteboards, flowcharts
dynamic programming; import/export data
track code on github
using known occurrences of a species, identify areas which have similar habitat and the potential to support populations
mutualisms?
modelling two species together, joint-SDMs
competition?
using suitability surfaces to plant assemblages of species in field experiments’
but, what about dispersal?
using gridded surfaces to model the probability of dispersal from known to suitable habitat
two hour discussion of the ‘sdm’ package by an author
large repository for high throughput modelling
large repository about spatial data in R
short activity using a sdm like process to teach spatial data
Ensemble learning utilizes many sets of trees, each tree being composed of many binary decisions, to create a single model. Each independent variable ( - or feature) may become a node on the tree - i.e. a location on the tree where a binary decision will move towards a predicted outcome. Each of the decision tree models which ensemble learning utilizes is a weak model, each of which may suffer due to high variance or bias, but which produce better outcomes than would be expected via chance. When ensembled these models generate a strong model, a model which should have more appropriately balanced variance and bias and predicts outcomes which are more strongly correlated with the expected values than the individual weak models.
Random Forest (RF) the training data are continually bootstrap re-sampled, in combination with random subsets of features, to create nodes which attempt to optimally predict a known outcome. A large number of trees are then aggregated, via the most common predictions, to generate a final classification prediction tree. Each individual prediction tree is generated independently of the others.
Boosted Regression Tree (BRT) (or Gradient Boosted tree) An initial tree is grown, and all other trees are derived sequentially from it, as each new tree is grown the errors in responses from the last tree are weighed more heavily so that the model focuses on selecting dependent variables which refine predictions. All response data and predictor variables are kept available to all trees.
Hijmans, Robert J. 2022. Terra: Spatial Data Analysis. https://CRAN.R-project.org/package=terra.
Kuhn, Max. 2022. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Naimi, Babak, and Miguel B. Araujo. 2016. “Sdm: A Reproducible and Extensible r Platform for Species Distribution Modelling.” Ecography 39: 368–75. https://doi.org/10.1111/ecog.01881.
Pebesma, Edzer. 2018. “ Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.